22 research outputs found

    plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    Get PDF
    Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: Biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery.</p

    antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

    Get PDF
    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production of such compounds. Since 2011, the ‘antibiotics and secondary metabolite analysis shell—antiSMASH’ has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally synthesized and post-translationally modified peptides cluster products, reporting of sequence similarity to proteins encoded in experimentally characterized gene clusters on a per-protein basis and a domain-level alignment tool for comparative analysis of trans-AT polyketide synthase assembly line architectures. Additionally, several usability features have been updated and improved. Together, these improvements make antiSMASH up-to-date with the latest developments in natural product research and will further facilitate computational genome mining for the discovery of novel bioactive molecules

    Biosynthetic potential of the global ocean microbiome

    Get PDF
    8 pages, 4 figures, supplementary information https://doi.org/10.1038/s41586-022-04862-3.-- This Article is contribution number 130 of Tara OceansNatural microbial communities are phylogenetically and metabolically diverse. In addition to underexplored organismal groups1, this diversity encompasses a rich discovery potential for ecologically and biotechnologically relevant enzymes and biochemical compounds2,3. However, studying this diversity to identify genomic pathways for the synthesis of such compounds4 and assigning them to their respective hosts remains challenging. The biosynthetic potential of microorganisms in the open ocean remains largely uncharted owing to limitations in the analysis of genome-resolved data at the global scale. Here we investigated the diversity and novelty of biosynthetic gene clusters in the ocean by integrating around 10,000 microbial genomes from cultivated and single cells with more than 25,000 newly reconstructed draft genomes from more than 1,000 seawater samples. These efforts revealed approximately 40,000 putative mostly new biosynthetic gene clusters, several of which were found in previously unsuspected phylogenetic groups. Among these groups, we identified a lineage rich in biosynthetic gene clusters (‘Candidatus Eudoremicrobiaceae’) that belongs to an uncultivated bacterial phylum and includes some of the most biosynthetically diverse microorganisms in this environment. From these, we characterized the phospeptin and pythonamide pathways, revealing cases of unusual bioactive compound structure and enzymology, respectively. Together, this research demonstrates how microbiomics-driven strategies can enable the investigation of previously undescribed enzymes and natural products in underexplored microbial groups and environmentsThis work was supported by funding from the ETH and the Helmut Horten Foundation; the Swiss National Science Foundation (SNSF) through project grants 205321_184955 to S.S., 205320_185077 to J.P. and the NCCR Microbiomes (51NF40_180575) to S.S.; by the Gordon and Betty Moore Foundation (https://doi.org/10.37807/GBMF9204) and the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 101000392 (MARBLES) to J.P.; by an ETH research grant ETH-21 18-2 to J.P.; and by the Peter and Traudl Engelhorn Foundation and by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 897571 to C.C.F. S.L.R. was supported by an ETH Zurich postdoctoral fellowship 20-1 FEL-07. M.L., L.M.C. and G.Z. were supported by EMBL Core Funding and the German Research Foundation (DFG, Deutsche Forschungsgemeinschaft, project no. 395357507, SFB 1371 to G.Z.). M.B.S. was supported by the NSF grant OCE#1829831. C.B. was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement Diatomic, no. 835067). S.G.A. was supported by the Spanish Ministry of Economy and Competitiveness (PID2020-116489RB-I00). M.K. and H.M. were funded by the SNSF grant 407540_167331 as part of the Swiss National Research Programme 75 ‘Big Data’. M.K., H.M. and A.K. are also partially funded by ETH core funding (to G. Rätsch)With the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe

    MIBiG 3.0 : a community-driven effort to annotate experimentally validated biosynthetic gene clusters

    Get PDF
    With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/

    Mapping natural product diversity through genomics

    No full text
    Background: Natural products (NP) from plants and microbes are a rich source for bioactive compounds essential for human life. A large part of agriculture, lifestyle and healthcare practice relies on metabolites derived from natural sources. To examine the biosynthetic potential of organisms and to guide NP discovery efforts, people increasingly utilise metabolomic, transcriptomic and genomic approaches. The co-location of metabolic genes in microbial genomes (termed Biosynthetic Gene Cluster or BGC) paves a way for an inexpensive and high throughput survey of natural products. While a focused scope analysis that targets a specific family of known compound chemistry was proven successful to optimize the compound&rsquo;s utility, a truly global overview which will open our eyes to the actual extent of novel chemistries lies unexplored in nature is still hampered by the limitation of (high quality) data, techniques and bioinformatic tools that are currently available. Results: A computational prediction tool PlantiSMASH was made to enable the exploration of putative plant BGCs, which combines genomic and transcriptomic data to give insights into plant secondary metabolism and evolution. To support large scale annotation and analysis of BGCs, a reference database of known BGC (MIBiG) was markedly improved both in quality and quantity, providing a 73% data increase over its initial release version. A large-scale study of BGC and Gene Cluster Family (GCF) diversity across taxa was done, enabled by the development of a novel bioinformatics tool which can process 1.2 million BGCs within ten days of computing time. Finally, an online database of more than 25,000 GCFs was released for the first time, giving means to the community to do crowdsourced curation, which in turn would come back and be useful in the annotation and discovery of putative or novel BGCs of their own. Conclusions: The works presented in this thesis provide the foundation for a global diversity-informed NP discovery efforts. Research aimed to discover novel products from nature can now have a better compass to guide its direction moving forward. With the increasing accessibility of long reads sequencing technology, it is now possible to do biosynthetic discovery and functional analysis of microbial BGCs straight from the environment. Combined with the continual improvement of metabolomics, this work will be the half-piece of a truly global and large scale meta analysis, linking genomes and metabolites present in the microbial world. Finally, although our work didn&rsquo;t shift the established notion that BGCs are an exclusive feature of prokaryotic genomes, we find that there is still some level of genome organization in plants which could be useful in biosynthetic pathways analysis, especially when combined with transcriptomics dat

    BiG-FAM: the biosynthetic gene cluster families database

    No full text
    Computational analysis of biosynthetic gene clusters (BGCs) has revolutionized natural product discovery by enabling the rapid investigation of secondary metabolic potential within microbial genome sequences. Grouping homologous BGCs into Gene Cluster Families (GCFs) facilitates mapping their architectural and taxonomic diversity and provides insights into the novelty of putative BGCs, through dereplication with BGCs of known function. While multiple databases exist for exploring BGCs from publicly available data, no public resources exist that focus on GCF relationships. Here, we present BiG-FAM, a database of 29,955 GCFs capturing the global diversity of 1,225,071 BGCs predicted from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs). The database offers rich functionalities, such as multi-criterion GCF searches, direct links to BGC databases such as antiSMASH-DB, and rapid GCF annotation of user-supplied BGCs from antiSMASH results. BiG-FAM can be accessed online at https://bigfam.bioinformatics.nl

    The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes

    No full text
    Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/

    BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters

    No full text
    BACKGROUND: Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). RESULTS: Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. CONCLUSIONS: BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.</p

    Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes

    No full text
    Bacterial specialized metabolites are a proven source of antibiotics and cancer therapeutics, but whether we have sampled all the secondary metabolite chemical diversity of cultivated bacteria is not known. We analysed ~ 170,000 bacterial genomes and ~ 47,000 metagenome assembled genomes (MAGs) using a modified BiG-SLiCE and the new clust-o-matic algorithm. We found that only 3% of the natural products potentially encoded in bacterial genomes have been experimentally characterized. We show that the variation of secondary metabolite biosynthetic diversity drops significantly on a genus level, identifying it as an appropriate taxonomic rank for comparison. Equal comparison of genera based on Relative Evolutionary Distance revealed that Streptomyces bacteria encode the largest biosynthetic diversity by far, with Amycolatopsis, Kutzneria and Micromonospora also encoding substantial chemical diversity. Finally we find that several less-well-studied taxa such as Weeksellaceae (Bacteroidota), Myxococcaceae (Myxococcota), Pleurocapsa and Nostocaceae (Cyanobacteria) have potential to produce highly diverse secondary metabolites that warrant further investigation
    corecore